Big data need big theory too

نویسندگان

  • Peter V Coveney
  • Edward R Dougherty
  • Roger R Highfield
چکیده

The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease with which digitized data can be acquired in virtually all fields of endeavour, from science, healthcare and cybersecurity to economics, social sciences and the humanities. In multiscale modelling, machine learning appears to provide a shortcut to reveal correlations of arbitrary complexity between processes at the atomic, molecular, meso- and macroscales. Here, we point out the weaknesses of pure big data approaches with particular focus on biology and medicine, which fail to provide conceptual accounts for the processes to which they are applied. No matter their 'depth' and the sophistication of data-driven methods, such as artificial neural nets, in the end they merely fit curves to existing data. Not only do these methods invariably require far larger quantities of data than anticipated by big data aficionados in order to produce statistically reliable results, but they can also fail in circumstances beyond the range of the data used to train them because they are not designed to model the structural characteristics of the underlying system. We argue that it is vital to use theory as a guide to experimental design for maximal efficiency of data collection and to produce reliable predictive models and conceptual knowledge. Rather than continuing to fund, pursue and promote 'blind' big data projects with massive budgets, we call for more funding to be allocated to the elucidation of the multiscale and stochastic processes controlling the behaviour of complex systems, including those of life, medicine and healthcare.This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Leveraging Big Data for Competitive Advantage

Big data means data sets which are too large, too unstructured and too fast changing, to use traditional data management methods. Enterprises that want to collect and process this data need new solutions for data processing and analysis. The aim of this paper is to identify the potential of big data analytics (BDA) as a source of competitive advantage of manufacturing companies in the market. T...

متن کامل

Big Data Begets Big Database Theory

Industry analysts describe Big Data in terms of three V’s: volume, velocity, variety. The data is too big to process with current tools; it arrives too fast for optimal storage and indexing; and it is too heterogeneous to fit into a rigid schema. There is a huge pressure on database researchers to study, explain, and solve the technical challenges in big data, but we find no inspiration in the ...

متن کامل

Correlation of Big Data with Supply Chain Health Performance in Employees of the Tehran Intelligent Fuel System

Introduction: The dramatic growth of big data and its application in preventing waste of resources and increasing financial performance and supply chain health levels, need to be examined from different perspectives. This study aimed to determine the correlation between big data and supply chain health performance in employees of Tehran Intelligent Fuel System. Methods: In this descriptive cor...

متن کامل

An Architecture for Security and Protection of Big Data

The issue of online privacy and security is a challenging subject, as it concerns the privacy of data that are increasingly more accessible via the internet. In other words, people who intend to access the private information of other users can do so more efficiently over the internet. This study is an attempt to address the privacy issue of distributed big data in the context of cloud computin...

متن کامل

Sometimes Too Big: Compressing trajectory Data

In the regime of “Big Data”, data compression techniques take crucial part in preparation phase of data analysis. It is challenging because statistical properties and other characteristics need to be preserved while the size of data need to be reduced. In particular, to compress trajectory data, movement status (such as position, direction, and speed etc.) need to be retained. In this paper, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 374  شماره 

صفحات  -

تاریخ انتشار 2016